Generation of Adaptive Vocabulary Lexicon for Japanese LVCSR

نویسنده

Charles C. H. Jie

چکیده

One of the thorniest problems of large vocabulary continuous speech recognition systems is the large number of out-of-vocabulary (OOV) words. This is especially the case for the languages like Japanese, which has many inflections, compound words and loanwords. The OOV words vary with the application domains. It's not realistic to have a big general-purpose lexicon including any possible 00V words. Furthermore, embedded speech recognition systems become more and more popular recently. They strongly demand an economical and effective exploitation of lexicon space. In this paper, we introduce a lexicon development model dealing with different kinds of OOV words with the help of linguistic morphological knowledge. It provides unsupervised, fast vocabulary adaptation among different application domains. In the experiments that adapt a lexicon of typical vocabulary, we were able to reduce the OOV rate by 40% and improve the word segmentation error rate by 27%. And the smaller the lexicon is, the more we benefit from the vocabulary adaptation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid language model for open-vocabulary Thai LVCSR

This paper investigates the use of a hybrid language model for open-vocabulary Thai LVCSR. Thai text is written without word boundary markers and the definition of word unit is often ambiguous due to the presence of compound words. Hence, to build open-vocabulary LVCSR, a very large lexicon is required to also handle word unit ambiguity. Pseudomorpheme (PM), a syllable-like sub-word unit specif...

متن کامل

Language independent and language adaptive large vocabulary speech recognition

This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Tu...

متن کامل

Speech Input Acoustic Analysis Phoneme Inventory Pronunciation Lexicon

This paper gives an overview of an architecture and search organization for large vocabulary, continuous speech recognition (LVCSR at RWTH). In the rst part of the paper, we describe the principle and architecture of a LVCSR system. In particular, the issues of modeling and search for phoneme based recognition are discussed. In the second part, we review the word conditioned lexical tree search...

متن کامل

Speech Input Acoustic Analysis Phoneme Inventory Pronunciation Lexicon Language Model

متن کامل

Experiments towards a Multi-language LVCSR Interface

This paper describes experiments towards a multilanguage human-computer speech interface. Our interface is designed for large vocabulary continuous speech input. For this purpose a multilingual dictation database has been collected under GlobalPhone, which is a project at the Interactive Systems Labs. This project investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Generation of Adaptive Vocabulary Lexicon for Japanese LVCSR

نویسنده

چکیده

منابع مشابه

A hybrid language model for open-vocabulary Thai LVCSR

Language independent and language adaptive large vocabulary speech recognition

Speech Input Acoustic Analysis Phoneme Inventory Pronunciation Lexicon

Speech Input Acoustic Analysis Phoneme Inventory Pronunciation Lexicon Language Model

Experiments towards a Multi-language LVCSR Interface

عنوان ژورنال:

اشتراک گذاری